• Linear Regression: Standard linear regression models a relationship between a dependent variable (y) and an independent variable (x) as a straight line:

y = β₀ + β₁x

Where:

β₀ is the intercept.

β₁ is the slope.

  • Introducing the Quadratic Term: Quadratic regression extends linear regression by adding a squared term of the independent variable (x²):

y = β₀ + β₁x + β₂x²

Where:

β₂ is the coefficient of the squared term.

The Curve:

The x² term introduces a curve into the relationship.

If β₂ is positive, the curve opens upward (like a U).

If β₂ is negative, the curve opens downward (like an inverted U).

1 Sheet 1

1.1 What is the relationship between population and IGF revenue performance patterns?

# Descriptive statistics
Cleaned_4_MMDAs_Data %>% skim(Population)
Data summary
Name Piped data
Number of rows 40
Number of columns 70
_______________________
Column type frequency:
numeric 1
________________________
Group variables None

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
Population 0 1 1376317 1156872 174370 348984.8 657000 2263250 3630000 ▇▁▃▂▂
Cleaned_4_MMDAs_Data %>% skim(IGF)
Data summary
Name Piped data
Number of rows 40
Number of columns 70
_______________________
Column type frequency:
numeric 1
________________________
Group variables None

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
IGF 0 1 18488415 13590300 945774.9 3004224 19947114 24508545 55200507 ▇▇▇▁▁
# Histograms
ggplot(Cleaned_4_MMDAs_Data, aes(x = Population)) +
  geom_histogram(bins = 10, fill = "dodgerblue", color = "black") +
  labs(title = "Distribution of Population", x = "Population", y = "Frequency") +
  scale_x_continuous(labels = comma)

ggplot(Cleaned_4_MMDAs_Data, aes(x = IGF)) +
  geom_histogram(bins = 10, fill = "dodgerblue", color = "black") +
  labs(title = "Distribution of IGF Revenue", x = "IGF Revenue", y = "Frequency") +
  scale_x_continuous(labels = comma)

# Growth Rate (Percentage)
Cleaned_4_MMDAs_Data <- Cleaned_4_MMDAs_Data %>%
  mutate(
    Population_Growth_Rate = c(NA, diff(Population) / Population[-length(Population)] * 100),
    IGF_Growth_Rate = c(NA, diff(IGF) / IGF[-length(IGF)] * 100)
  )

# Plot of Trends






ggplot(Cleaned_4_MMDAs_Data, aes(x = Year, y = Population)) + 
  geom_point(color = "blue") +
  geom_smooth(method = "lm", se = TRUE, color = "red", linetype = "dashed") +
  labs(
    title = "Trends in Population Growth ",
    x = "Year (2012-2022)",
    y = "Population"
  ) +
  theme(plot.title = element_text(hjust = 0.5))+
  scale_y_continuous(labels = comma)

ggplot(Cleaned_4_MMDAs_Data, aes(x = Year, y = IGF)) + 
  geom_point(color = "blue") +
  geom_smooth(method = "lm", se = TRUE, color = "red", linetype = "dashed") +
  labs(
    title = "Trends in IGF Revenue (Ghana Cedis) Growth ",
    x = "Year (2012-2022)",
    y = "IGF Revenue (Ghana Cedis)"
  ) +
  theme(plot.title = element_text(hjust = 0.5))+
  scale_y_continuous(labels = comma)

ggplot(Cleaned_4_MMDAs_Data, aes(x = Population, y = IGF)) +
  geom_point(color = "blue") +
  labs( title = "Population vs. IGF Revenue",
        x = "population", y = "IGF Revenue (Ghana Cedis)") +
  theme(plot.title = element_text(hjust = 0.5))+
  scale_y_continuous(labels = comma)

The histograms show the uneven distribution of population and IGF revenue. The scatter plot show presence of three clusters in of population and IGF revenue. from the scatter plot as population increases IGF revenue tends to increase.

1.1.1 Regression Analysis

mod1 <- lm(IGF ~ Population, data = Cleaned_4_MMDAs_Data)
summary(mod1)
## 
## Call:
## lm(formula = IGF ~ Population, data = Cleaned_4_MMDAs_Data)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -12729616 -11966444  -1782121   7985036  33634606 
## 
## Coefficients:
##                 Estimate   Std. Error t value Pr(>|t|)    
## (Intercept) 11547811.860  3078507.762   3.751 0.000586 ***
## Population         5.043        1.721   2.930 0.005708 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 12430000 on 38 degrees of freedom
## Multiple R-squared:  0.1843, Adjusted R-squared:  0.1628 
## F-statistic: 8.584 on 1 and 38 DF,  p-value: 0.005708
Cleaned_4_MMDAs_Data %>%
  ggplot(aes(x = Population, y = IGF)) +
  geom_point() +
  geom_smooth(method = "lm", se = TRUE) + 
  labs(x = "Population", y = "IGF Revenue (Ghana Cedis)", title = "Linear Relationship between Population and IGF Revenue") + 
  scale_y_continuous(labels = scales::comma)

# The Quadratic Term
Cleaned_4_MMDAs_Data$Population_Squared <- Cleaned_4_MMDAs_Data$Population^2

#  Quadratic Regression
mod_quad <- lm(IGF ~ Population + Population_Squared, data = Cleaned_4_MMDAs_Data)

summary(mod_quad)
## 
## Call:
## lm(formula = IGF ~ Population + Population_Squared, data = Cleaned_4_MMDAs_Data)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -13667977 -12443856  -1170177   8596836  30208431 
## 
## Coefficients:
##                             Estimate        Std. Error t value Pr(>|t|)  
## (Intercept)        7835768.916689358 4321321.203248335   1.813   0.0779 .
## Population              13.905555380       7.484822769   1.858   0.0712 .
## Population_Squared      -0.000002653       0.000002181  -1.216   0.2316  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 12360000 on 37 degrees of freedom
## Multiple R-squared:  0.2156, Adjusted R-squared:  0.1732 
## F-statistic: 5.086 on 2 and 37 DF,  p-value: 0.01118
ggplot(Cleaned_4_MMDAs_Data, aes(x = Population, y = IGF)) +
  geom_point() +
  geom_smooth(method = "lm", formula = y ~ x + I(x^2), se = TRUE) + # Use formula for quadratic
  labs(x = "Population", y = "IGF Revenue (Ghana Cedis)", title = "Quadratic Relationship between Population and IGF Revenue") +
  scale_y_continuous(labels = comma)

Linear Regression:

Interpretation:

The linear model shows a statistically significant positive relationship between Population and IGF. But the Multiple R-squared = 0.1843 indicates Population explains only 18.43% of the variance in IGF. Adjusted R-squared = 0.1628 is low as well.

Quadratic Regression:

Interpretation: The quadratic model shows a statistically significant relationship between population and IGF revenue in terms of the overall model but the individual terms are not significant. A slight improvement of the R-squared (0.2156).

  • Checking Regression Assumptions
# Residual
ggplot(data = data.frame(residuals = residuals(mod1), fitted = fitted(mod1)), aes(x = fitted, y = residuals)) +
  geom_point() + # Added geom_point()
  geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
  labs(title = "Residuals vs. Fitted (Linear) ", x = "Fitted Values", y = "Residuals")

ggplot(data = data.frame(residuals = residuals(mod1)), aes(x = residuals)) +
  geom_histogram(bins = 10, fill = "skyblue", color = "black") +
  labs(title = "Histogram of Residuals(Linear)", x = "Residuals")

ggplot(data = data.frame(residuals = residuals(mod1)), aes(sample = residuals)) +
  geom_point(stat = "qq") +
  stat_qq_line() +
  labs(title = "Q-Q Plot of Residuals")

#  Residuals vs. Fitted Values
ggplot(data = data.frame(residuals = residuals(mod_quad), fitted = fitted(mod_quad)), 
       aes(x = fitted, y = residuals)) +
  geom_point() +
  geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
  labs(title = "Residuals vs. Fitted (Quadratic Model)", x = "Fitted Values", y = "Residuals")

#  Histogram of Residuals
ggplot(data = data.frame(residuals = residuals(mod_quad)), aes(x = residuals)) +
  geom_histogram(bins = 10, fill = "skyblue", color = "black") +
  labs(title = "Histogram of Residuals (Quadratic Model)", x = "Residuals")

#  Q-Q Plot of Residuals
ggplot(data = data.frame(residuals = residuals(mod_quad)), aes(sample = residuals)) +
  geom_point(stat = "qq") +
  stat_qq_line() +
  labs(title = "Q-Q Plot of Residuals (Quadratic Model)")

shapiro.test(resid(mod1))
## 
##  Shapiro-Wilk normality test
## 
## data:  resid(mod1)
## W = 0.89188, p-value = 0.001119
shapiro.test(resid(mod_quad))
## 
##  Shapiro-Wilk normality test
## 
## data:  resid(mod_quad)
## W = 0.90865, p-value = 0.003446
#  Durbin-Watson Test (Autocorrelation)
dwtest(mod1)
## 
##  Durbin-Watson test
## 
## data:  mod1
## DW = 0.34349, p-value = 0.000000000001035
## alternative hypothesis: true autocorrelation is greater than 0
dwtest(mod_quad)
## 
##  Durbin-Watson test
## 
## data:  mod_quad
## DW = 0.41706, p-value = 0.00000000001975
## alternative hypothesis: true autocorrelation is greater than 0
#  Breusch-Pagan Test (Homoscedasticity)
bptest(mod1)
## 
##  studentized Breusch-Pagan test
## 
## data:  mod1
## BP = 0.009074, df = 1, p-value = 0.9241
bptest(mod_quad)
## 
##  studentized Breusch-Pagan test
## 
## data:  mod_quad
## BP = 7.6576, df = 2, p-value = 0.02174
#  Variance Inflation Factor (VIF) - Multicollinearity
bptest(mod1)
## 
##  studentized Breusch-Pagan test
## 
## data:  mod1
## BP = 0.009074, df = 1, p-value = 0.9241
vif(mod_quad)
##         Population Population_Squared 
##            19.1496            19.1496

Both the linear and quadratic models violate simple linear regression assumptions.

  • Transformations
# Transformed Model
log_log_mod <- lm(log(IGF) ~ log(Population), data = Cleaned_4_MMDAs_Data)
summary(log_log_mod)
## 
## Call:
## lm(formula = log(IGF) ~ log(Population), data = Cleaned_4_MMDAs_Data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.1580 -1.2204  0.2704  0.9180  1.5090 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      10.4923     2.5313   4.145 0.000183 ***
## log(Population)   0.4197     0.1844   2.276 0.028594 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.178 on 38 degrees of freedom
## Multiple R-squared:  0.1199, Adjusted R-squared:  0.09677 
## F-statistic: 5.178 on 1 and 38 DF,  p-value: 0.02859
# Scatter Plots (Transformed Data)
ggplot(Cleaned_4_MMDAs_Data, aes(x = Ln_Pop, y = Ln_IGF)) +
  geom_point() +
  geom_smooth(method = "lm") +
  labs(title = "Log(Population) vs. Log(IGF Revenue)", x = "Log(Population)", y = "Log(IGF Revenue)")

sqrt_model <- lm(sqrt(IGF) ~ Population, data = Cleaned_4_MMDAs_Data)
summary(sqrt_model)
## 
## Call:
## lm(formula = sqrt(IGF) ~ Population, data = Cleaned_4_MMDAs_Data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2236.02 -1673.40   -59.75  1421.85  3075.75 
## 
## Coefficients:
##                 Estimate   Std. Error t value     Pr(>|t|)    
## (Intercept) 2909.7332986  402.2778935   7.233 0.0000000119 ***
## Population     0.0007270    0.0002249   3.232      0.00254 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1625 on 38 degrees of freedom
## Multiple R-squared:  0.2157, Adjusted R-squared:  0.195 
## F-statistic: 10.45 on 1 and 38 DF,  p-value: 0.002539

In both transformations the regression analysis produce statistically signifucant results.

# Function to perform diagnostic tests and plots
perform_diagnostics <- function(model, model_name) {
  # Residuals vs. Fitted
  plot1 <- ggplot(data = data.frame(residuals = residuals(model), fitted = fitted(model)),
                 aes(x = fitted, y = residuals)) +
    geom_point() +
    geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
    labs(title = paste("Residuals vs. Fitted (", model_name, ")"), x = "Fitted Values", y = "Residuals")

  # Histogram of Residuals
  plot2 <- ggplot(data = data.frame(residuals = residuals(model)), aes(x = residuals)) +
    geom_histogram(bins = 10, fill = "skyblue", color = "black") +
    labs(title = paste("Histogram of Residuals (", model_name, ")"), x = "Residuals")

  # Q-Q Plot of Residuals
  plot3 <- ggplot(data = data.frame(residuals = residuals(model)), aes(sample = residuals)) +
    geom_point(stat = "qq") +
    stat_qq_line() +
    labs(title = paste("Q-Q Plot of Residuals (", model_name, ")"))

  # Durbin-Watson Test
  dw_test <- dwtest(model)
  print(paste("Durbin-Watson Test (", model_name, "):"))
  print(dw_test)

  # Breusch-Pagan Test
  bp_test <- bptest(model)
  print(paste("Breusch-Pagan Test (", model_name, "):"))
  print(bp_test)

  # Print VIF (if applicable)
  if (length(coef(model)) > 2) { # Check for multiple predictors
    vif_result <- vif(model)
    print(paste("VIF (", model_name, "):"))
    print(vif_result)
  }

  # Arrange plots
  grid.arrange(plot1, plot2, plot3, nrow = 1)
}

# Perform diagnostics for each model
perform_diagnostics(mod1, "Linear Model")
## [1] "Durbin-Watson Test ( Linear Model ):"
## 
##  Durbin-Watson test
## 
## data:  model
## DW = 0.34349, p-value = 0.000000000001035
## alternative hypothesis: true autocorrelation is greater than 0
## 
## [1] "Breusch-Pagan Test ( Linear Model ):"
## 
##  studentized Breusch-Pagan test
## 
## data:  model
## BP = 0.009074, df = 1, p-value = 0.9241

perform_diagnostics(log_log_mod, "Log-Log Model")
## [1] "Durbin-Watson Test ( Log-Log Model ):"
## 
##  Durbin-Watson test
## 
## data:  model
## DW = 0.22651, p-value = 0.0000000000000002973
## alternative hypothesis: true autocorrelation is greater than 0
## 
## [1] "Breusch-Pagan Test ( Log-Log Model ):"
## 
##  studentized Breusch-Pagan test
## 
## data:  model
## BP = 14.377, df = 1, p-value = 0.0001496

perform_diagnostics(sqrt_model, "Square Root Model")
## [1] "Durbin-Watson Test ( Square Root Model ):"
## 
##  Durbin-Watson test
## 
## data:  model
## DW = 0.22835, p-value = 0.0000000000000003264
## alternative hypothesis: true autocorrelation is greater than 0
## 
## [1] "Breusch-Pagan Test ( Square Root Model ):"
## 
##  studentized Breusch-Pagan test
## 
## data:  model
## BP = 7.1666, df = 1, p-value = 0.007428

cor.test(Cleaned_4_MMDAs_Data$Population, Cleaned_4_MMDAs_Data$IGF)
## 
##  Pearson's product-moment correlation
## 
## data:  Cleaned_4_MMDAs_Data$Population and Cleaned_4_MMDAs_Data$IGF
## t = 2.9299, df = 38, p-value = 0.005708
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.1359439 0.6534081
## sample estimates:
##       cor 
## 0.4292744

Therefore from the analysis so far we found a strong and statistically significant positive linear relationship between population and IGF revenue. The population size correlated with IGF revenue performance but the relationship is not perfectly strong (Pearson’s product-moment correlation coefficient = 0.4292744 ) . The assumptions are not met even after the transformations, exploring thier relationship through regression might it them.

1.2 What is the relationship between population and DACF revenue performance patterns?

Cleaned_4_MMDAs_Data %>% skim(Population)
Data summary
Name Piped data
Number of rows 40
Number of columns 73
_______________________
Column type frequency:
numeric 1
________________________
Group variables None

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
Population 0 1 1376317 1156872 174370 348984.8 657000 2263250 3630000 ▇▁▃▂▂
Cleaned_4_MMDAs_Data %>% skim(DACF)
Data summary
Name Piped data
Number of rows 40
Number of columns 73
_______________________
Column type frequency:
numeric 1
________________________
Group variables None

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
DACF 0 1 4031443 2372091 802346.2 2202404 3356098 6050121 9497586 ▇▇▂▅▂
# Histograms
ggplot(Cleaned_4_MMDAs_Data, aes(x = Population)) +
  geom_histogram(bins = 10, fill = "dodgerblue", color = "black") +
  labs(title = "Distribution of Population", x = "Population")

ggplot(Cleaned_4_MMDAs_Data, aes(x = DACF)) +
  geom_histogram(bins = 10, fill = "dodgerblue", color = "black") +
  labs(title = "Distribution of DACF Revenue", x = "DACF Revenue")

# Plot of Trends






ggplot(Cleaned_4_MMDAs_Data, aes(x = Year, y = Population)) + 
  geom_point(color = "blue") +
  geom_smooth(method = "lm", se = TRUE, color = "red", linetype = "dashed") +
  labs(
    title = "Trends in Population Growth ",
    x = "Year (2012-2022)",
    y = "Population"
  ) +
  theme(plot.title = element_text(hjust = 0.5))+
  scale_y_continuous(labels = comma)

ggplot(Cleaned_4_MMDAs_Data, aes(x = Year, y = DACF)) + 
  geom_point(color = "blue") +
  geom_smooth(method = "lm", se = TRUE, color = "red", linetype = "dashed") +
  labs(
    title = "Trends in DACF Revenue (Ghana Cedis) Growth ",
    x = "Year (2012-2022)",
    y = "DACF Revenue (Ghana Cedis)"
  ) +
  theme(plot.title = element_text(hjust = 0.5))+
  scale_y_continuous(labels = comma)

ggplot(Cleaned_4_MMDAs_Data, aes(x = Population, y = DACF)) +
  geom_point(color = "blue") +
  labs( title = "Population vs. DACF Revenue",
        x = "population", y = "DACF Revenue (Ghana Cedis)") +
  theme(plot.title = element_text(hjust = 0.5))+
  scale_y_continuous(labels = comma)

The histograms show an uneven distribution of population and DACF revenue. Both are right skewed. The scatter plot show a positive relationship between population and DACF revenue.

1.2.1 Regression Analysis

mod2 <- lm(DACF ~ Population, data = Cleaned_4_MMDAs_Data)
summary(mod2)
## 
## Call:
## lm(formula = DACF ~ Population, data = Cleaned_4_MMDAs_Data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -3261572 -1024436    13929  1022584  4732563 
## 
## Coefficients:
##                 Estimate   Std. Error t value   Pr(>|t|)    
## (Intercept) 2230187.9602  457985.5435   4.870 0.00001994 ***
## Population        1.3088       0.2561   5.111 0.00000938 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1850000 on 38 degrees of freedom
## Multiple R-squared:  0.4074, Adjusted R-squared:  0.3918 
## F-statistic: 26.12 on 1 and 38 DF,  p-value: 0.000009376
Cleaned_4_MMDAs_Data %>%
  ggplot(aes(x = Population, y = DACF)) +
  geom_point() +
  geom_smooth(method = "lm", se = TRUE) + # Added confidence intervals
  labs(x = "Population", y = "DACF Revenue (Ghana Cedis)", title = "Linear Relationship between Population and DACF Revenue") +
  scale_y_continuous(labels = scales::comma)

#  Quadratic Regression
mod_quad <- lm(DACF ~ Population + Population_Squared, data = Cleaned_4_MMDAs_Data)

summary(mod_quad)
## 
## Call:
## lm(formula = DACF ~ Population + Population_Squared, data = Cleaned_4_MMDAs_Data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -3484317 -1027457    34632  1177382  4253610 
## 
## Coefficients:
##                              Estimate         Std. Error t value Pr(>|t|)  
## (Intercept)        1722652.9777718000  644870.5054404821   2.671   0.0112 *
## Population               2.5205136256       1.1169596554   2.257   0.0300 *
## Population_Squared      -0.0000003627       0.0000003255  -1.114   0.2723  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1844000 on 37 degrees of freedom
## Multiple R-squared:  0.4266, Adjusted R-squared:  0.3957 
## F-statistic: 13.77 on 2 and 37 DF,  p-value: 0.00003395

From the regression results there is a statistically significant linear relationship between population and DACF revenue performance patterns (p-value: 0.000009376, R-squared: 0.4074, and Adjusted R-squared: 0.3918 ). The Population coefficient is 1.3088 means positive relationship. Population explains 40.74% of the variation in DACF revenue . The quadratic model too is significant.

  • Checking Regression Assumptions
#  Residual 
ggplot(data = data.frame(residuals = residuals(mod2),
                        fitted = fitted(mod2)),
       aes(x = fitted, y = residuals)) +
  geom_point() +
  geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
  labs(title = "Residuals vs. Fitted",
       x = "Fitted Values", y = "Residuals")

ggplot(data = data.frame(residuals = residuals(mod2)),
       aes(x = residuals)) +
  geom_histogram(bins = 10, fill = "skyblue", color = "black") +
  labs(title = "Histogram of Residuals", x = "Residuals")

ggplot(data = data.frame(residuals = residuals(mod2)),
       aes(sample = residuals)) +
  stat_qq() +
  stat_qq_line() +
  labs(title = "Q-Q Plot of Residuals ")

shapiro.test(resid(mod2))
## 
##  Shapiro-Wilk normality test
## 
## data:  resid(mod2)
## W = 0.97786, p-value = 0.6104
# Autocorrelation
dwtest(mod2)
## 
##  Durbin-Watson test
## 
## data:  mod2
## DW = 1.6924, p-value = 0.1292
## alternative hypothesis: true autocorrelation is greater than 0
# Homoscedasticity (Constant Variance of Residuals)

bptest(mod2)
## 
##  studentized Breusch-Pagan test
## 
## data:  mod2
## BP = 3.6303, df = 1, p-value = 0.05674
# Multicollinearity
#simple linear regression with one predictor(population), multicollinearity is not an issue.


# Multivariate Normality

#It is a simple linear regression with one predictor(population), multicollinearity therefore this is not an issue.


shapiro.test(resid(mod2))
## 
##  Shapiro-Wilk normality test
## 
## data:  resid(mod2)
## W = 0.97786, p-value = 0.6104

The test of the assumptions of linear regression show all the assumptions are met.

  • Transformation.
#Transformed Models
Cleaned_4_MMDAs_Data$DACF <- Cleaned_4_MMDAs_Data$DACF 
log_mod2 <- lm(log(DACF) ~ log(Population), data = Cleaned_4_MMDAs_Data)
summary(log_mod2 )
# 
# Call:
# lm(formula = log(DACF) ~ log(Population), data = Cleaned_4_MMDAs_Data)
# 
# Residuals:
#      Min       1Q   Median       3Q      Max 
# -1.13736 -0.30598  0.09786  0.33927  0.83955 
# 
# Coefficients:
#                 Estimate Std. Error t value        Pr(>|t|)    
# (Intercept)       9.7185     1.0773   9.021 0.0000000000552 ***
# log(Population)   0.3879     0.0785   4.941 0.0000159394618 ***
# ---
# Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# 
# Residual standard error: 0.5015 on 38 degrees of freedom
# Multiple R-squared:  0.3912,  Adjusted R-squared:  0.3752 
# F-statistic: 24.42 on 1 and 38 DF,  p-value: 0.00001594
sqrt_mod2 <- lm( sqrt(DACF)~sqrt(Population), data = Cleaned_4_MMDAs_Data )  
summary(sqrt_mod2)
# 
# Call:
# lm(formula = sqrt(DACF) ~ sqrt(Population), data = Cleaned_4_MMDAs_Data)
# 
# Residuals:
#     Min      1Q  Median      3Q     Max 
# -855.03 -282.33   58.54  313.93  910.86 
# 
# Coefficients:
#                   Estimate Std. Error t value     Pr(>|t|)    
# (Intercept)      1128.3519   166.6564   6.771 0.0000000504 ***
# sqrt(Population)    0.7492     0.1421   5.274 0.0000056320 ***
# ---
# Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# 
# Residual standard error: 452.1 on 38 degrees of freedom
# Multiple R-squared:  0.4226,  Adjusted R-squared:  0.4074 
# F-statistic: 27.81 on 1 and 38 DF,  p-value: 0.000005632
#  Scatter Plots (Transformed Data)
ggplot(Cleaned_4_MMDAs_Data, aes(x = log(Population), y = log(DACF))) +
  geom_point() +
  geom_smooth(method = "lm")+
  labs(title = "Log(Population) vs. Log(DACF Revenue)",
       x = "Log(Population)", y = "Log(DACF Revenue)")

ggplot(Cleaned_4_MMDAs_Data, aes(x = log(Population), y = log(DACF))) +
  geom_point() +
  geom_smooth(method = "lm")+
  labs(title = "Sqrt(Population) vs. Sqrt(DACF Revenue)",
       x = "Sqrt(Population)", y = "Sqrt(DACF Revenue)")

# Function to perform diagnostic tests and plots
perform_diagnostics <- function(model, model_name) {
  # Residuals vs. Fitted
  plot1 <- ggplot(data = data.frame(residuals = residuals(model), fitted = fitted(model)),
                 aes(x = fitted, y = residuals)) +
    geom_point() +
    geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
    labs(title = paste("Residuals vs. Fitted (", model_name, ")"), x = "Fitted Values", y = "Residuals")

  # Histogram of Residuals
  plot2 <- ggplot(data = data.frame(residuals = residuals(model)), aes(x = residuals)) +
    geom_histogram(bins = 10, fill = "skyblue", color = "black") +
    labs(title = paste("Histogram of Residuals (", model_name, ")"), x = "Residuals")

  # Q-Q Plot of Residuals
  plot3 <- ggplot(data = data.frame(residuals = residuals(model)), aes(sample = residuals)) +
    geom_point(stat = "qq") +
    stat_qq_line() +
    labs(title = paste("Q-Q Plot of Residuals (", model_name, ")"))

  # Durbin-Watson Test
  dw_test <- dwtest(model)
  print(paste("Durbin-Watson Test (", model_name, "):"))
  print(dw_test)

  # Breusch-Pagan Test
  bp_test <- bptest(model)
  print(paste("Breusch-Pagan Test (", model_name, "):"))
  print(bp_test)

  # Print VIF (if applicable)
  if (length(coef(model)) > 2) { # Check for multiple predictors
    vif_result <- vif(model)
    print(paste("VIF (", model_name, "):"))
    print(vif_result)
  }

  # Arrange plots
  grid.arrange(plot1, plot2, plot3, nrow = 1)
}

# Perform diagnostics for each model
perform_diagnostics(mod2, "Linear Model")
## [1] "Durbin-Watson Test ( Linear Model ):"
## 
##  Durbin-Watson test
## 
## data:  model
## DW = 1.6924, p-value = 0.1292
## alternative hypothesis: true autocorrelation is greater than 0
## 
## [1] "Breusch-Pagan Test ( Linear Model ):"
## 
##  studentized Breusch-Pagan test
## 
## data:  model
## BP = 3.6303, df = 1, p-value = 0.05674

perform_diagnostics(log_mod2, "Log-Log Model")
## [1] "Durbin-Watson Test ( Log-Log Model ):"
## 
##  Durbin-Watson test
## 
## data:  model
## DW = 1.6013, p-value = 0.06482
## alternative hypothesis: true autocorrelation is greater than 0
## 
## [1] "Breusch-Pagan Test ( Log-Log Model ):"
## 
##  studentized Breusch-Pagan test
## 
## data:  model
## BP = 0.010191, df = 1, p-value = 0.9196

perform_diagnostics(sqrt_mod2, "Square Root Model")
## [1] "Durbin-Watson Test ( Square Root Model ):"
## 
##  Durbin-Watson test
## 
## data:  model
## DW = 1.6579, p-value = 0.1073
## alternative hypothesis: true autocorrelation is greater than 0
## 
## [1] "Breusch-Pagan Test ( Square Root Model ):"
## 
##  studentized Breusch-Pagan test
## 
## data:  model
## BP = 2.0396, df = 1, p-value = 0.1533

shapiro.test(resid(mod2))
## 
##  Shapiro-Wilk normality test
## 
## data:  resid(mod2)
## W = 0.97786, p-value = 0.6104
shapiro.test(resid(log_mod2))
## 
##  Shapiro-Wilk normality test
## 
## data:  resid(log_mod2)
## W = 0.94962, p-value = 0.07356
shapiro.test(resid(sqrt_mod2))
## 
##  Shapiro-Wilk normality test
## 
## data:  resid(sqrt_mod2)
## W = 0.9674, p-value = 0.2968

Both the log-log and square root transformations are statistically significant. And slightly improve relationship

# Function to perform diagnostic tests and plots
perform_diagnostics <- function(model, model_name) {
  # Residuals vs. Fitted
  plot1 <- ggplot(data = data.frame(residuals = residuals(model), fitted = fitted(model)),
                 aes(x = fitted, y = residuals)) +
    geom_point() +
    geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
    labs(title = paste("Residuals vs. Fitted (", model_name, ")"), x = "Fitted Values", y = "Residuals")

  # Histogram of Residuals
  plot2 <- ggplot(data = data.frame(residuals = residuals(model)), aes(x = residuals)) +
    geom_histogram(bins = 10, fill = "skyblue", color = "black") +
    labs(title = paste("Histogram of Residuals (", model_name, ")"), x = "Residuals")

  # Q-Q Plot of Residuals
  plot3 <- ggplot(data = data.frame(residuals = residuals(model)), aes(sample = residuals)) +
    geom_point(stat = "qq") +
    stat_qq_line() +
    labs(title = paste("Q-Q Plot of Residuals (", model_name, ")"))

  # Durbin-Watson Test
  dw_test <- dwtest(model)
  print(paste("Durbin-Watson Test (", model_name, "):"))
  print(dw_test)

  # Breusch-Pagan Test
  bp_test <- bptest(model)
  print(paste("Breusch-Pagan Test (", model_name, "):"))
  print(bp_test)

  # Print VIF (if applicable)
  if (length(coef(model)) > 2) { # Check for multiple predictors
    vif_result <- vif(model)
    print(paste("VIF (", model_name, "):"))
    print(vif_result)
  }

  # Arrange plots
  grid.arrange(plot1, plot2, plot3, nrow = 1)
}

# Perform diagnostics for each model
perform_diagnostics(mod2, "Linear Model")
## [1] "Durbin-Watson Test ( Linear Model ):"
## 
##  Durbin-Watson test
## 
## data:  model
## DW = 1.6924, p-value = 0.1292
## alternative hypothesis: true autocorrelation is greater than 0
## 
## [1] "Breusch-Pagan Test ( Linear Model ):"
## 
##  studentized Breusch-Pagan test
## 
## data:  model
## BP = 3.6303, df = 1, p-value = 0.05674

perform_diagnostics(log_mod2, "Log-Log Model")
## [1] "Durbin-Watson Test ( Log-Log Model ):"
## 
##  Durbin-Watson test
## 
## data:  model
## DW = 1.6013, p-value = 0.06482
## alternative hypothesis: true autocorrelation is greater than 0
## 
## [1] "Breusch-Pagan Test ( Log-Log Model ):"
## 
##  studentized Breusch-Pagan test
## 
## data:  model
## BP = 0.010191, df = 1, p-value = 0.9196

perform_diagnostics(sqrt_mod2, "Square Root Model")
## [1] "Durbin-Watson Test ( Square Root Model ):"
## 
##  Durbin-Watson test
## 
## data:  model
## DW = 1.6579, p-value = 0.1073
## alternative hypothesis: true autocorrelation is greater than 0
## 
## [1] "Breusch-Pagan Test ( Square Root Model ):"
## 
##  studentized Breusch-Pagan test
## 
## data:  model
## BP = 2.0396, df = 1, p-value = 0.1533

shapiro.test(resid(mod2))
## 
##  Shapiro-Wilk normality test
## 
## data:  resid(mod2)
## W = 0.97786, p-value = 0.6104
shapiro.test(resid(log_mod2))
## 
##  Shapiro-Wilk normality test
## 
## data:  resid(log_mod2)
## W = 0.94962, p-value = 0.07356
shapiro.test(resid(sqrt_mod2))
## 
##  Shapiro-Wilk normality test
## 
## data:  resid(sqrt_mod2)
## W = 0.9674, p-value = 0.2968

None of the assumptions are violated.

From the regression analysis so all the models are statistically significant and all all assumptions met. The linear model, log-log model, and square root model are all statistically significant. In the linear model for every 1 unit increase in population, DACF increases by 1.3088. In the log model for every 1% increase in population, DACF increases by .3879%. And also in the square root model a one-unit increase in the square root of Population is associated with a 0.7492-unit increase in the square root of DACF.

Given these models it can be concluded that changes in the population can predict changes in the DACF revenue performance and any observed pattern could not likely be due to chance.

1.3 What is the relationship between population, recurerent and capital expenditure?

  • Descriptive Statistics
Cleaned_4_MMDAs_Data %>% skim(Capital_Expenditure)
Data summary
Name Piped data
Number of rows 40
Number of columns 73
_______________________
Column type frequency:
numeric 1
________________________
Group variables None

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
Capital_Expenditure 0 1 10950443 9929727 895337.7 4780086 7434365 11857008 46223724 ▇▃▂▁▁
Cleaned_4_MMDAs_Data %>% skim(Recrrent_Expenditure)
Data summary
Name Piped data
Number of rows 40
Number of columns 73
_______________________
Column type frequency:
numeric 1
________________________
Group variables None

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
Recrrent_Expenditure 9 0.78 11583810 7734814 864055.3 3741295 13550852 17984476 24388461 ▇▁▃▅▃
# Capital Expenditure Histogram
cap_hist <- ggplot(Cleaned_4_MMDAs_Data, aes(x = Capital_Expenditure)) +
  geom_histogram(aes(y = ..density..), bins = 10, fill = "dodgerblue", color = "black") +
  geom_density(color = "red") +
  labs(title = "Distribution of Capital Expenditure", x = "Capital Expenditure (Ghana Cedis)", y = "Density") +
  scale_x_continuous(labels = comma) 



recu_hist <- ggplot(Cleaned_4_MMDAs_Data, aes(x = Recrrent_Expenditure)) +
  geom_histogram(aes(y = ..density..), bins = 10, fill = "dodgerblue", color = "black") +
  geom_density(color = "red") +
  labs(title = "Distribution of  Recurrent Expenditure ", x = "Recurrent Expenditure (Ghana Cedis)", y = "Density") +
  scale_x_continuous(labels = comma) 


# Population Histogram
pop_hist <- ggplot(Cleaned_4_MMDAs_Data, aes(x = Population)) +
  geom_histogram(aes(y = ..density..), bins = 10, fill = "dodgerblue", color = "black") +
  geom_density(color = "red") +
  labs(title = "Distribution of Population", x = "Population", y = "Density") +
  scale_x_continuous(labels = comma) 

cap_hist

recu_hist

pop_hist

ggplot(Cleaned_4_MMDAs_Data, aes(x = Year, y = Population)) + 
  geom_point(color = "blue") +
  geom_smooth(method = "lm", se = TRUE, color = "red", linetype = "dashed") +
  labs(
    title = "Population Trend",
    x = "Year (2012-2022)",
    y = "Population"
  ) +
  theme(plot.title = element_text(hjust = 0.5))+
  scale_y_continuous(labels = comma)

ggplot(Cleaned_4_MMDAs_Data, aes(x = Year, y = Capital_Expenditure)) + 
  geom_point(color = "blue") +
  geom_smooth(method = "lm", se = TRUE, color = "red", linetype = "dashed") +
  labs(
    title = "Capital Expenditure Trend",
    x = "Year (2012-2022)",
    y = "Capital Expenditure"
  ) +
  theme(plot.title = element_text(hjust = 0.5))+
  scale_y_continuous(labels = comma)

ggplot(Cleaned_4_MMDAs_Data, aes(x = Year, y = Recrrent_Expenditure)) + 
  geom_point(color = "blue") +
  geom_smooth(method = "lm", se = TRUE, color = "red", linetype = "dashed") +
  labs(
    title = "Recurrent Expenditure Trend",
    x = "Year (2012-2022)",
    y = "Recurrent Expenditure"
  ) +
  theme(plot.title = element_text(hjust = 0.5))+
  scale_y_continuous(labels = comma)

ggplot(Cleaned_4_MMDAs_Data, aes(x = Population, y = Capital_Expenditure)) +
  geom_point(color = "blue") +
  labs( title = "Population vs. Capital Expenditure",
        x = "population", y = "Capital Expenditure (Ghana Cedis)") +
  theme(plot.title = element_text(hjust = 0.5))+
  scale_y_continuous(labels = comma) 

ggplot(Cleaned_4_MMDAs_Data, aes(x = Population, y = Recrrent_Expenditure)) +
  geom_point(color = "blue") +
  labs( title = "Population vs. Recurrent Expenditure",
        x = "population", y = "Recurrent Expenditure (Ghana Cedis)") +
  theme(plot.title = element_text(hjust = 0.5))+
  scale_y_continuous(labels = comma) 

# Calculate Per Capita Values
Cleaned_4_MMDAs_Data$Capital_Exp_Per_Capita <- Cleaned_4_MMDAs_Data$Capital_Expenditure / Cleaned_4_MMDAs_Data$Population



# Per Capita Analysis 
average_capita <- mean(Cleaned_4_MMDAs_Data$Capital_Exp_Per_Capita)

ggplot(Cleaned_4_MMDAs_Data, aes(x = Year)) +
  geom_point(aes(y = Capital_Exp_Per_Capita, color = "Capital Exp. Per Capita"), color = "blue") +
  labs(title = "Capital Expenditure Per Capita Over Time", x = "Year (2012 - 2022) ", y = "Ghana Cedis Per Capita", color = "Type") +
  scale_y_continuous(labels = comma) 

1.3.1 Regression Results

mod3 <- lm(cbind(Capital_Expenditure, Recrrent_Expenditure) ~ Population, data = Cleaned_4_MMDAs_Data)
summary(mod3)
## Response Capital_Expenditure :
## 
## Call:
## lm(formula = Capital_Expenditure ~ Population, data = Cleaned_4_MMDAs_Data)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -11328484  -4446239  -2234438   2144594  31316925 
## 
## Coefficients:
##                Estimate  Std. Error t value Pr(>|t|)  
## (Intercept) 6365461.982 2363138.031   2.694   0.0116 *
## Population        3.097       1.316   2.353   0.0256 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9019000 on 29 degrees of freedom
##   (9 observations deleted due to missingness)
## Multiple R-squared:  0.1603, Adjusted R-squared:  0.1314 
## F-statistic: 5.538 on 1 and 29 DF,  p-value: 0.02561
## 
## 
## Response Recrrent_Expenditure :
## 
## Call:
## lm(formula = Recrrent_Expenditure ~ Population, data = Cleaned_4_MMDAs_Data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -8040592 -5802868 -1976070  5056903 15867100 
## 
## Coefficients:
##                 Estimate   Std. Error t value Pr(>|t|)    
## (Intercept) 7440523.2478 1769853.4264   4.204 0.000229 ***
## Population        3.1691       0.9856   3.215 0.003192 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6755000 on 29 degrees of freedom
##   (9 observations deleted due to missingness)
## Multiple R-squared:  0.2628, Adjusted R-squared:  0.2374 
## F-statistic: 10.34 on 1 and 29 DF,  p-value: 0.003192
mod_cap <- lm(Capital_Expenditure ~ Population, data = Cleaned_4_MMDAs_Data)
summary(mod_cap)
## 
## Call:
## lm(formula = Capital_Expenditure ~ Population, data = Cleaned_4_MMDAs_Data)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -12766743  -5251934  -2102442   3645339  30310330 
## 
## Coefficients:
##                Estimate  Std. Error t value Pr(>|t|)   
## (Intercept) 6006767.436 2261882.225   2.656   0.0115 * 
## Population        3.592       1.265   2.840   0.0072 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9136000 on 38 degrees of freedom
## Multiple R-squared:  0.1751, Adjusted R-squared:  0.1534 
## F-statistic: 8.068 on 1 and 38 DF,  p-value: 0.007202
mod_rec <- lm(Recrrent_Expenditure ~ Population, data = Cleaned_4_MMDAs_Data)
summary(mod_rec)
## 
## Call:
## lm(formula = Recrrent_Expenditure ~ Population, data = Cleaned_4_MMDAs_Data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -8040592 -5802868 -1976070  5056903 15867100 
## 
## Coefficients:
##                 Estimate   Std. Error t value Pr(>|t|)    
## (Intercept) 7440523.2478 1769853.4264   4.204 0.000229 ***
## Population        3.1691       0.9856   3.215 0.003192 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6755000 on 29 degrees of freedom
##   (9 observations deleted due to missingness)
## Multiple R-squared:  0.2628, Adjusted R-squared:  0.2374 
## F-statistic: 10.34 on 1 and 29 DF,  p-value: 0.003192
Cleaned_4_MMDAs_Data %>% 
  ggplot(aes(x = Population, y = Capital_Expenditure)) +
  geom_point()+
  geom_smooth(method = "lm", se = TRUE) + labs(x = "Population", y = "Capital Expenditure", title = "Linear Relationship Population and Capital Expenditure")+
   scale_y_continuous(labels = scales::comma)

Cleaned_4_MMDAs_Data %>%
  ggplot(aes(x = Population, y = Recrrent_Expenditure)) +
  geom_point() +
  geom_smooth(method = "lm", se = TRUE) +
  labs(x = "Population", y = "Recurrent Expenditure", title = "Linear Relationship Population and Recurrent Expenditure") +
  scale_y_continuous(labels = scales::comma)

From the linear regression results there is a significant positive linear relationship between Population and Capital and Recurrent Expenditure.

  • Checking Regression Assumptions
# Diagnostic Function
perform_diagnostics <- function(model, model_name) {
  # Residuals vs. Fitted
  plot1 <- ggplot(data = data.frame(residuals = residuals(model), fitted = fitted(model)),
                 aes(x = fitted, y = residuals)) +
    geom_point() +
    geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
    labs(title = paste("Residuals vs. Fitted (", model_name, ")"), x = "Fitted Values", y = "Residuals")

  # Histogram of Residuals
  plot2 <- ggplot(data = data.frame(residuals = residuals(model)), aes(x = residuals)) +
    geom_histogram(bins = 10, fill = "skyblue", color = "black") +
    labs(title = paste("Histogram of Residuals (", model_name, ")"), x = "Residuals")

  # Q-Q Plot of Residuals
  plot3 <- ggplot(data = data.frame(residuals = residuals(model)), aes(sample = residuals)) +
    geom_point(stat = "qq") +
    stat_qq_line() +
    labs(title = paste("Q-Q Plot of Residuals (", model_name, ")"))

  # Durbin-Watson Test
  dw_test <- dwtest(model)
  print(paste("Durbin-Watson Test (", model_name, "):"))
  print(dw_test)

  # Breusch-Pagan Test
  bp_test <- bptest(model)
  print(paste("Breusch-Pagan Test (", model_name, "):"))
  print(bp_test)

  # Print VIF (if applicable)
  if (length(coef(model)) > 2) { # Check for multiple predictors
    vif_result <- vif(model)
    print(paste("VIF (", model_name, "):"))
    print(vif_result)
  }

  # Arrange plots
  grid.arrange(plot1, plot2, plot3, nrow = 1)
}

#  Perform Diagnostics
# Capital Expenditure
perform_diagnostics(mod_cap, "Capital Expenditure Model")
## [1] "Durbin-Watson Test ( Capital Expenditure Model ):"
## 
##  Durbin-Watson test
## 
## data:  model
## DW = 0.86878, p-value = 0.00002011
## alternative hypothesis: true autocorrelation is greater than 0
## 
## [1] "Breusch-Pagan Test ( Capital Expenditure Model ):"
## 
##  studentized Breusch-Pagan test
## 
## data:  model
## BP = 8.1362, df = 1, p-value = 0.004339

perform_diagnostics(mod_rec, "Recurrent Expenditure Model")
## [1] "Durbin-Watson Test ( Recurrent Expenditure Model ):"
## 
##  Durbin-Watson test
## 
## data:  model
## DW = 0.52471, p-value = 0.0000000961
## alternative hypothesis: true autocorrelation is greater than 0
## 
## [1] "Breusch-Pagan Test ( Recurrent Expenditure Model ):"
## 
##  studentized Breusch-Pagan test
## 
## data:  model
## BP = 6.3745, df = 1, p-value = 0.01158

# Recurrent Expenditure

From the linear models violate the most of the assumptions of linear regression

  • Quadratic model
Cleaned_4_MMDAs_Data$Recrrent_Expenditure_squared <- Cleaned_4_MMDAs_Data$Recrrent_Expenditure^2

Cleaned_4_MMDAs_Data$Capital_Expenditure_squared <- Cleaned_4_MMDAs_Data$Capital_Expenditure^2

mod_quad <- lm(cbind(Capital_Expenditure, Recrrent_Expenditure) ~ Population + Population_Squared, data = Cleaned_4_MMDAs_Data)

# View the summary
summary(mod_quad)
## Response Capital_Expenditure :
## 
## Call:
## lm(formula = Capital_Expenditure ~ Population + Population_Squared, 
##     data = Cleaned_4_MMDAs_Data)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -11103603  -5642278  -2484793   2604630  29136527 
## 
## Coefficients:
##                             Estimate        Std. Error t value Pr(>|t|)  
## (Intercept)        2092421.801843737 3696487.467239643   0.566   0.5759  
## Population              14.211410888       7.604401699   1.869   0.0721 .
## Population_Squared      -0.000003182       0.000002145  -1.483   0.1492  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8838000 on 28 degrees of freedom
##   (9 observations deleted due to missingness)
## Multiple R-squared:  0.2215, Adjusted R-squared:  0.1659 
## F-statistic: 3.983 on 2 and 28 DF,  p-value: 0.03004
## 
## 
## Response Recrrent_Expenditure :
## 
## Call:
## lm(formula = Recrrent_Expenditure ~ Population + Population_Squared, 
##     data = Cleaned_4_MMDAs_Data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -7887901 -4481014 -1944637  4240246 15154962 
## 
## Coefficients:
##                              Estimate         Std. Error t value Pr(>|t|)    
## (Intercept)        11010190.152475385  2741774.699953048   4.016 0.000403 ***
## Population               -6.115856635        5.640369776  -1.084 0.287480    
## Population_Squared        0.000002658        0.000001591   1.670 0.105992    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6555000 on 28 degrees of freedom
##   (9 observations deleted due to missingness)
## Multiple R-squared:  0.3296, Adjusted R-squared:  0.2817 
## F-statistic: 6.883 on 2 and 28 DF,  p-value: 0.003704
#  Scatter Plots (Transformed Data)
ggplot(Cleaned_4_MMDAs_Data, aes(x = Population, y = Capital_Expenditure)) +
  geom_point() +
  geom_smooth(method = "lm", formula = y ~ x + I(x^2), se = TRUE) +
  labs(x = "Population", y = "Capital Expenditure (Ghana Cedis)", title = "Quadratic Relationship between Population and Capital Expenditure") +
  scale_y_continuous(labels = comma)

ggplot(Cleaned_4_MMDAs_Data, aes(x = Population, y = Recrrent_Expenditure)) +
  geom_point() +
  geom_smooth(method = "lm", formula = y ~ x + I(x^2), se = TRUE) +
  labs(x = "Population", y = "Recurrent Expenditure (Ghana Cedis)", title = "Quadratic Relationship between Population and Recurrent Expenditure") +
  scale_y_continuous(labels = comma)

Quadratic model show an improvement of the relationship between population and capital expenditure. The overall p-value is still significant.

  • Transformations
# Log Transformation for Recurrent Expenditure 

Cleaned_4_MMDAs_Data$Capital_Expenditure_adjusted <- Cleaned_4_MMDAs_Data$Capital_Expenditure + 1
log_cap_mod <- lm(log(Capital_Expenditure_adjusted) ~ Population, data = Cleaned_4_MMDAs_Data)
summary(log_cap_mod)
## 
## Call:
## lm(formula = log(Capital_Expenditure_adjusted) ~ Population, 
##     data = Cleaned_4_MMDAs_Data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.84433 -0.41447 -0.02797  0.66523  1.35168 
## 
## Coefficients:
##                  Estimate    Std. Error t value             Pr(>|t|)    
## (Intercept) 15.4252198367  0.1993591208  77.374 < 0.0000000000000002 ***
## Population   0.0000003162  0.0000001115   2.837              0.00727 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.8053 on 38 degrees of freedom
## Multiple R-squared:  0.1748, Adjusted R-squared:  0.1531 
## F-statistic: 8.048 on 1 and 38 DF,  p-value: 0.007265
perform_diagnostics(log_cap_mod, "Log capital Expenditure Model")
## [1] "Durbin-Watson Test ( Log capital Expenditure Model ):"
## 
##  Durbin-Watson test
## 
## data:  model
## DW = 0.52255, p-value = 0.000000002959
## alternative hypothesis: true autocorrelation is greater than 0
## 
## [1] "Breusch-Pagan Test ( Log capital Expenditure Model ):"
## 
##  studentized Breusch-Pagan test
## 
## data:  model
## BP = 0.21996, df = 1, p-value = 0.6391

Cleaned_4_MMDAs_Data$Ln_Population <- log(Cleaned_4_MMDAs_Data$Population)
Cleaned_4_MMDAs_Data$Ln_Capital_Expenditure <- log(Cleaned_4_MMDAs_Data$Capital_Expenditure)



  

ggplot(Cleaned_4_MMDAs_Data, aes(x = log(Population), y = log(Capital_Expenditure))) +
  geom_point() +
  geom_smooth(method = "lm", se = TRUE)+
  labs(title = "Log(Population) vs. Log(Capital Expenditure)",
       x = "Log(Population)", y = "Log(Capital Expenditure)")

#  Square root transformation for Capital Expenditure
sqrt_cap_mod <- lm(sqrt(Capital_Expenditure) ~ Population, data = Cleaned_4_MMDAs_Data)
summary(sqrt_cap_mod)
## 
## Call:
## lm(formula = sqrt(Capital_Expenditure) ~ Population, data = Cleaned_4_MMDAs_Data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2170.2  -824.0  -256.1   755.6  3076.7 
## 
## Coefficients:
##                 Estimate   Std. Error t value      Pr(>|t|)    
## (Intercept) 2364.5993423  298.4236837   7.924 0.00000000144 ***
## Population     0.0004922    0.0001668   2.950       0.00541 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1205 on 38 degrees of freedom
## Multiple R-squared:  0.1863, Adjusted R-squared:  0.1649 
## F-statistic: 8.703 on 1 and 38 DF,  p-value: 0.005414
perform_diagnostics(sqrt_cap_mod, "Square root Capital Expenditure Model")
## [1] "Durbin-Watson Test ( Square root Capital Expenditure Model ):"
## 
##  Durbin-Watson test
## 
## data:  model
## DW = 0.72057, p-value = 0.000000898
## alternative hypothesis: true autocorrelation is greater than 0
## 
## [1] "Breusch-Pagan Test ( Square root Capital Expenditure Model ):"
## 
##  studentized Breusch-Pagan test
## 
## data:  model
## BP = 7.145, df = 1, p-value = 0.007517

From the regression analysis above the relationship between population , capital , and recurrent expenditure is positive linear and significant.

1.4 What is the relationship between revenue growth and infrastructure delivery (Model)

Using total revenue growth rate and infrastructure delivery (capital expenditure per capita).

# Descriptive statistics
Cleaned_4_MMDAs_Data %>% skim(Capital_Exp_Per_Capita)
Data summary
Name Piped data
Number of rows 40
Number of columns 78
_______________________
Column type frequency:
numeric 1
________________________
Group variables None

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
Capital_Exp_Per_Capita 0 1 14.11 14.5 0.73 4.36 8.97 18.44 58.96 ▇▃▂▁▁
Cleaned_4_MMDAs_Data %>% skim(TtRev_Growth_Rate)
Data summary
Name Piped data
Number of rows 40
Number of columns 78
_______________________
Column type frequency:
numeric 1
________________________
Group variables None

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
TtRev_Growth_Rate 3 0.92 2.85 25.27 -81.19 -11.06 7.9 18.14 40.94 ▁▁▃▇▆
# Histograms
ggplot(Cleaned_4_MMDAs_Data, aes(x = Capital_Exp_Per_Capita)) +
  geom_histogram(bins = 10, fill = "dodgerblue", color = "black") +
  labs(title = "Distribution of Capital expenditure per capita", x = "Capital expenditure per capita") +
  scale_x_continuous(labels = comma)

ggplot(Cleaned_4_MMDAs_Data, aes(x = TtRev_Growth_Rate)) +
  geom_histogram(bins = 10, fill = "dodgerblue", color = "black") +
  labs(title = "Distribution of Total Revenue Growth Rate", x = "Total revenue growth rate") 

The histograms show an uneven distribution of Capital expenditure per capita.And Total revenue growth rate.

1.4.1 Regression results

mod5 <- lm(Capital_Exp_Per_Capita ~ TtRev_Growth_Rate, data = Cleaned_4_MMDAs_Data)
summary(mod5)
## 
## Call:
## lm(formula = Capital_Exp_Per_Capita ~ TtRev_Growth_Rate, data = Cleaned_4_MMDAs_Data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -14.180 -10.176  -4.637   6.675  44.986 
## 
## Coefficients:
##                   Estimate Std. Error t value   Pr(>|t|)    
## (Intercept)       14.57674    2.46733   5.908 0.00000102 ***
## TtRev_Growth_Rate  0.08449    0.09835   0.859      0.396    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 14.91 on 35 degrees of freedom
##   (3 observations deleted due to missingness)
## Multiple R-squared:  0.02065,    Adjusted R-squared:  -0.007331 
## F-statistic: 0.738 on 1 and 35 DF,  p-value: 0.3961
ggplot(Cleaned_4_MMDAs_Data, aes(x = TtRev_Growth_Rate, y = Capital_Exp_Per_Capita)) +
  geom_point() +
  geom_smooth(method = "lm", se = TRUE)+
  labs(title = "Revenue Growth vs. Capital Expenditure (Per Capita)",
       x = "Total Revenue Growth Rate (%)",
       y = "Capital Expenditure Per Capita")

The regression result show there no statistically significant relationship between total revenue growth rate and infrastructure delivery (capital expenditure per capita) with p-value (0.3961) is greater than 0.05 significance level. This means that changes in revenue growth do not significantly predict changes in capital expenditure per capita in this model. The R-squared (0.02065) indicates only 2.07% of the variation in capital expenditure per capita can be explained by revenue growth (total revenue growth rate)

1.5 What is the relationship between expenditure growth and infrastructure delivery?

  • Regression results using expenditure growth (Expenditure_Growth) and infrastructure delivery (capital expenditure per capita).
Cleaned_4_MMDAs_Data$Expenditure_Growth <- c(NA, diff(Cleaned_4_MMDAs_Data$Total_Expenditure) / Cleaned_4_MMDAs_Data$Total_Expenditure[-nrow(Cleaned_4_MMDAs_Data)]) * 100




  
  ggplot(Cleaned_4_MMDAs_Data, aes(x = Expenditure_Growth, y = Capital_Exp_Per_Capita)) +
    geom_point() + geom_smooth(method = "lm", se = TRUE)+
    labs(title = "Relationship Expenditure Growth vs. Capital Expenditure (Per Capita)",
         x = "Expenditure Growth Rate (%)",
         y = "Capital Expenditure Per Capita")

There is no statistically significant linear relationship.

2 SHEET 2

2.1 What is the relationship between allocative and funding decision-making and revenue patterns?

# no variables

2.2 What is the relationship between allocative decision-making and expenditure patterns?

  • No direct variables are available on this, some descriptive statistics of closely related are below
# Trends of Revenue and Expenditure over the years.


ggplot(Cleaned_4_MMDAs_Data, aes(x = Year, y = Total_Revenue)) + 
  geom_point(color = "blue") +
  geom_smooth(method = "lm", se = TRUE, color = "red", linetype = "dashed") +
  labs(title = "Total Revenue Trend",
       x = "Year (2012 - 2012)",
       y = "Amount (Ghana Cedis)") +
 scale_y_continuous(labels = comma) 

ggplot(Cleaned_4_MMDAs_Data, aes(x = Year, y = Total_Revenue)) +
  geom_bar(stat = "identity", fill = "dodgerblue") +
  labs(title = "Total Revenue Trend",
       x = "Year",
       y = "Amount (Ghana Cedis)") +
 scale_y_continuous(labels = comma) 

ggplot(Cleaned_4_MMDAs_Data, aes(x = Year, y = Total_Expenditure)) + 
  geom_point(color = "blue") +
  geom_smooth(method = "lm", se = TRUE, color = "red", linetype = "dashed") +
  labs(
    title = "Trends in Total Expenditure Growth ",
    x = "Year (2012-2022)",
    y = "Amount (Ghana Cedis)"
  ) +
  theme(plot.title = element_text(hjust = 0.5))+
  scale_y_continuous(labels = comma)

ggplot(Cleaned_4_MMDAs_Data, aes(x = Year, y = Total_Expenditure)) +
  geom_bar(stat = "identity", fill = "dodgerblue") +
  labs(title = "Total Expenditure Trend",
       x = "Year",
       y = "Amount (Ghana Cedis)") +
 scale_y_continuous(labels = comma) 

ggplot(Cleaned_4_MMDAs_Data, aes(x = Year)) +
  geom_point(aes(y = Total_Revenue, color = "Total Revenue")) +
  geom_point(aes(y = Total_Expenditure, color = "Total Expenditure")) +
  labs(title = "Revenue Vs. Expenditure Trends Over Years",
       x = "Year",
       y = "Amount (Ghana Cedis)", color = "Type") +
  scale_color_manual(values = c("Total Revenue" = "blue", "Total Expenditure" = "red")) +
  scale_y_continuous(labels = comma) 

ggplot(Cleaned_4_MMDAs_Data, aes(x = Total_Revenue, y = Total_Expenditure)) +
  geom_point(color = "blue") +
  labs( title = "Total Revenue  Vs. Total Expenditure (Ghana Cedis)",
        x = "Total Revenue", y = "Total Expenditure ") +
  theme(plot.title = element_text(hjust = 0.5))+
  scale_y_continuous(labels = comma) +
  scale_x_continuous(labels = comma) 

ggplot(Cleaned_4_MMDAs_Data, aes(x = Year)) +
  geom_point(aes(y = IGF, color = "IGF"), linewidth = 1) +
  geom_point(aes(y = DACF, color = "DACF"), linewidth = 1) +
  geom_point(aes(y = Capital_Expenditure, color = "Capital Expenditure"), linewidth = 1) +
  geom_point(aes(y = Others_Sources, color = "Other Sources"), linewidth = 1) +
  labs(
    title = "Revenue  Trends",
    x = "Year",
    y = "Amount (Ghana Cedis)",
    color = "Type"
  ) +
  scale_color_manual(
    values = c(
      "Total Revenue" = "#0000FF",  # Blue
      "Other Sources" = "#87CEEB",  # Light Blue
      "IGF" = "#00CD66",  # Green
      "DACF" = "#808080",  # Gray
      "Capital Expenditure" = "#9370DB",  # Purple
      "Total Expenditure" = "#FF0000",  # Red
      "Recurrent Expenditure" = "#FFD700"  # Yellow
    )
  ) +
  scale_y_continuous(labels = comma, breaks = seq(0, 60000000, 10000000)) + # Added breaks
  theme(
    legend.position = "right",
    legend.title = element_text(face = "bold"),
    plot.title = element_text(hjust = 0.5, face = "bold")
  )

# IGF to Total Expenditure Ratio 
ggplot(Cleaned_4_MMDAs_Data, aes(x = Year, y = IGF_TE)) +
  geom_point(size = 2.5) +
  labs(
    title = "IGF to Total Expenditure Ratio Over Years",
    x = "Year",
    y = "Ratio (IGF/Total Expenditure)"
  ) +
  scale_y_continuous(labels = percent_format(accuracy = 1)) 

cor.test(Cleaned_4_MMDAs_Data$Total_Expenditure, Cleaned_4_MMDAs_Data$Total_Revenue)
## 
##  Pearson's product-moment correlation
## 
## data:  Cleaned_4_MMDAs_Data$Total_Expenditure and Cleaned_4_MMDAs_Data$Total_Revenue
## t = 39.799, df = 38, p-value < 0.00000000000000022
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.9776727 0.9937968
## sample estimates:
##       cor 
## 0.9882165

2.3 What is the relationship between population trend, service delivery and revenue and expenditure patterns?

# Revenue Per Capita
Cleaned_4_MMDAs_Data$Total_Revenue_Per_Capita <- Cleaned_4_MMDAs_Data$Total_Revenue / Cleaned_4_MMDAs_Data$Population
Cleaned_4_MMDAs_Data$IGF_Per_Capita <- Cleaned_4_MMDAs_Data$IGF / Cleaned_4_MMDAs_Data$Population
Cleaned_4_MMDAs_Data$DACF_Per_Capita <- Cleaned_4_MMDAs_Data$DACF / Cleaned_4_MMDAs_Data$Population




ggplot(Cleaned_4_MMDAs_Data, aes(x = Year)) +
  geom_point(aes(y = IGF, color = "IGF"), linewidth = 1) +
  geom_point(aes(y = DACF, color = "DACF"), linewidth = 1) +
  geom_point(aes(y = Others_Sources, color = "Other Sources"), linewidth = 1) +
  labs(
    title = "Revenue  Trends",
    x = "Year",
    y = "Amount (Ghana Cedis)",
    color = "Type"
  ) +
  scale_color_manual(
    values = c(
      "Total Revenue" = "#0000FF",  # Blue
      "Other Sources" = "#87CEEB",  # Light Blue
      "IGF" = "#00CD66",  # Green
      "DACF" = "#808080",  # Gray
      "Capital Expenditure" = "#9370DB",  # Purple
      "Total Expenditure" = "#FF0000",  # Red
      "Recurrent Expenditure" = "#FFD700"  # Yellow
    )
  ) +
  scale_y_continuous(labels = comma, breaks = seq(0, 60000000, 10000000)) + # Added breaks
  theme(
    legend.position = "right",
    legend.title = element_text(face = "bold"),
    plot.title = element_text(hjust = 0.5, face = "bold")
  )

# Population Trend


ggplot(Cleaned_4_MMDAs_Data, aes(x = Year, y = Total_Expenditure)) +
  geom_bar(stat = "identity", fill = "dodgerblue") +
  geom_point()+
  labs(title = "Total Expenditure Trend",
       x = "Year",
       y = "Amount (Ghana Cedis)") +
 scale_y_continuous(labels = comma)

ggplot(Cleaned_4_MMDAs_Data, aes(x = Year, y = Population)) +
  geom_bar(stat = "identity", fill = "dodgerblue") +
  geom_point()+
  labs(title = "Population Trend",
       x = "Year",
       y = "Population") 

ggplot(Cleaned_4_MMDAs_Data, aes(x = Year, y = IGF)) +
  geom_bar(stat = "identity", fill = "dodgerblue") +
  geom_point()+
  labs(title = "IGF Trend",
       x = "Year",
       y = "IGF") +
  scale_y_continuous(labels = comma) 

# Per capita plot
ggplot(Cleaned_4_MMDAs_Data, aes(x = Year)) +
  geom_line(aes(y = Total_Revenue_Per_Capita, color = "Total Revenue Per Capita")) +
  geom_point(aes(y = Total_Revenue_Per_Capita, color = "Total Revenue Per Capita")) +
  geom_line(aes(y = IGF_Per_Capita, color = "IGF Per Capita")) +
  geom_point(aes(y = IGF_Per_Capita, color = "IGF Per Capita")) +
  geom_line(aes(y = DACF_Per_Capita, color = "DACF Per Capita")) +
  geom_point(aes(y = DACF_Per_Capita, color = "DACF Per Capita")) +
  labs(title = "Revenue Per Capita trends", x = "Year", y = "Amount (Ghana Cedis)", color = "Type") +
  scale_y_continuous(labels = comma) 

cor_matrix <- cor(Cleaned_4_MMDAs_Data[, c("Population", "Total_Revenue", "Total_Expenditure", "IGF_TE", "IGF")], use = "complete.obs")
print(cor_matrix)
##                   Population Total_Revenue Total_Expenditure    IGF_TE
## Population         1.0000000     0.5299488         0.5520108 0.1521863
## Total_Revenue      0.5299488     1.0000000         0.9882165 0.4711984
## Total_Expenditure  0.5520108     0.9882165         1.0000000 0.4196154
## IGF_TE             0.1521863     0.4711984         0.4196154 1.0000000
## IGF                0.4292744     0.9361730         0.9130142 0.6904556
##                         IGF
## Population        0.4292744
## Total_Revenue     0.9361730
## Total_Expenditure 0.9130142
## IGF_TE            0.6904556
## IGF               1.0000000
corrplot(cor_matrix, main = "Correlation matrix of population and expenditure patterns")

In the above there is a moderate positive correlation between total revenue and total expenditure and also between IGF.

2.3.1 Regression Analysis

# Total Revenue vs Population
model_revenue_pop <- lm(Total_Revenue ~ Population, data = Cleaned_4_MMDAs_Data)
summary(model_revenue_pop)
## 
## Call:
## lm(formula = Total_Revenue ~ Population, data = Cleaned_4_MMDAs_Data)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -26468384 -15556098  -1699448  12888670  56700610 
## 
## Coefficients:
##                 Estimate   Std. Error t value   Pr(>|t|)    
## (Intercept) 26369062.420  4978752.876   5.296 0.00000524 ***
## Population        10.723        2.784   3.852   0.000437 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 20110000 on 38 degrees of freedom
## Multiple R-squared:  0.2808, Adjusted R-squared:  0.2619 
## F-statistic: 14.84 on 1 and 38 DF,  p-value: 0.0004366
ggplot(Cleaned_4_MMDAs_Data, aes(x = Population, y = Total_Revenue)) +
  geom_point() +
  geom_smooth(method = "lm", se = TRUE) +
  labs(title = "Total Revenue vs Population", x = "Population", y = "Total Revenue") +
  scale_x_continuous(labels = comma) +
  scale_y_continuous(labels = comma)

#  # Total Expenditure vs Population
model_expenditure_pop <- lm(Total_Expenditure ~ Population, data = Cleaned_4_MMDAs_Data)
summary(model_expenditure_pop)
## 
## Call:
## lm(formula = Total_Expenditure ~ Population, data = Cleaned_4_MMDAs_Data)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -26429954 -15426864  -2272971  12574424  49267490 
## 
## Coefficients:
##                 Estimate   Std. Error t value  Pr(>|t|)    
## (Intercept) 25060267.648  4949117.374   5.064 0.0000109 ***
## Population        11.292        2.767   4.081  0.000222 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 19990000 on 38 degrees of freedom
## Multiple R-squared:  0.3047, Adjusted R-squared:  0.2864 
## F-statistic: 16.65 on 1 and 38 DF,  p-value: 0.0002219
ggplot(Cleaned_4_MMDAs_Data, aes(x = Population, y = Total_Expenditure)) +
  geom_point() +
  geom_smooth(method = "lm", se = TRUE) +
  labs(title = "Total Expenditure vs Population", x = "Population", y = "Total Expenditure") +
  scale_x_continuous(labels = comma) +
  scale_y_continuous(labels = comma)

# Capital Expenditure vs Total Revenue and IGF_TE
model_capital_rev_igf <- lm(Capital_Expenditure ~ Total_Revenue + IGF_TE, data = Cleaned_4_MMDAs_Data)
summary(model_capital_rev_igf)
## 
## Call:
## lm(formula = Capital_Expenditure ~ Total_Revenue + IGF_TE, data = Cleaned_4_MMDAs_Data)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -12647433  -3540160   -724350   2904795  25463778 
## 
## Coefficients:
##                      Estimate      Std. Error t value   Pr(>|t|)    
## (Intercept)     2528523.53326   2878542.65189   0.878     0.3854    
## Total_Revenue         0.31826         0.05862   5.430 0.00000372 ***
## IGF_TE        -11508662.07997   6569938.76021  -1.752     0.0881 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7558000 on 37 degrees of freedom
## Multiple R-squared:  0.4504, Adjusted R-squared:  0.4206 
## F-statistic: 15.16 on 2 and 37 DF,  p-value: 0.00001554
ggplot(Cleaned_4_MMDAs_Data, aes(x = Total_Revenue, y = Capital_Expenditure)) +
  geom_point() +
  geom_smooth(method = "lm", se = TRUE) +
  labs(title = "Capital Expenditure vs Total Revenue", x = "Total Revenue", y = "Capital Expenditure") +
  scale_x_continuous(labels = comma) +
  scale_y_continuous(labels = comma)

ggplot(Cleaned_4_MMDAs_Data, aes(x = Population, y = IGF_TE)) +
  geom_point() +
  geom_smooth(method = "lm", se = TRUE) +
  labs(title = "IGF_TE vs Population", x = "Population", y = "IGF_TE") +
  scale_x_continuous(labels = comma) +
  scale_y_continuous(labels = percent_format(accuracy = 1))

# IGF_TE vs Population and Total Revenue
model_igfte_pop_rev <- lm(IGF_TE ~ Population + Total_Revenue, data = Cleaned_4_MMDAs_Data)
summary(model_igfte_pop_rev)
## 
## Call:
## lm(formula = IGF_TE ~ Population + Total_Revenue, data = Cleaned_4_MMDAs_Data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.26068 -0.13832 -0.04149  0.16199  0.55307 
## 
## Coefficients:
##                      Estimate      Std. Error t value Pr(>|t|)    
## (Intercept)    0.239981076202  0.061202888928   3.921 0.000368 ***
## Population    -0.000000024482  0.000000030605  -0.800 0.428864    
## Total_Revenue  0.000000004845  0.000000001513   3.203 0.002793 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1875 on 37 degrees of freedom
## Multiple R-squared:  0.2353, Adjusted R-squared:  0.1939 
## F-statistic: 5.691 on 2 and 37 DF,  p-value: 0.007
ggplot(Cleaned_4_MMDAs_Data, aes(x = Total_Revenue, y = IGF_TE)) +
  geom_point() +
  geom_smooth(method = "lm", se = TRUE) +
  labs(title = "IGF_TE vs Total Revenue", x = "Total Revenue", y = "IGF_TE") +
  scale_x_continuous(labels = comma) +
  scale_y_continuous(labels = percent_format(accuracy = 1))

In the regression results above, we found a significant linear relationship between between Total Revenue and Population, Total Expenditure and Population, and Capital Expenditure, Total Revenue. But there is non-significance between IGF_TE vs Population and Total Revenue.

2.4 What is the relationship between service delivery and revenue and expenditure patterns?

# no variables

2.5 SHEET 3